Unsupervised Learning - K-Means Clustering and Hierarchical Clustering - The Heritage Foundation's Economic Freedom Index Analysis 2019 - By David Salako.

Background and Context

Created in 1995 by the Heritage Foundation, The Index of Economic Freedom is a ranking created to measure the economic freedom in the countries of the world.

Now, in its 25th edition, The Economic Freedom Index is poised to help readers track over two decades of the advancement in economic freedom, prosperity, and opportunity and promote these ideas in their homes, schools, and communities.

The Index covers 12 freedoms, from property rights to financial freedom, in 186 countries.

Objective:

As a data scientist, I have been tasked to (1) analyze the data, (2) use clustering algorithms to identify different groups of countries based on economic freedom, and (3) list the insights from the analysis.

Data Dictionary & Description:

The data comprises factors indicating economic freedom. The list of variables in the data is given below. All these features are self-explanatory and more details can be found in the data source listed below.

* CountryID
* Country Name
* WEBNAME
* Region
* World Rank
* Region Rank
* 2019 Score
* Property Rights
* Judical Effectiveness
* Government Integrity
* Tax Burden
* Gov't Spending
* Fiscal Health
* Business Freedom
* Labor Freedom
* Monetary Freedom
* Trade Freedom
* Investment Freedom
* Financial Freedom
* Tariff Rate (%)
* Income Tax Rate (%)
* Corporate Tax Rate (%)
* Tax Burden % of GDP
* Gov't Expenditure % of GDP
* Country
* Population (Millions)
* GDP (Billions, PPP)
* GDP Growth Rate (%)
* 5 Year GDP Growth Rate (%)
* GDP per Capita (PPP)
* Unemployment (%)
* Inflation (%)
* FDI Inflow (Millions)
* Public Debt (% of GDP)

Data Source

This dataset belongs to The Heritage Foundation and is freely available to download on their website (https://www.heritage.org/index/ranking).

The Index of Economic Freedom considers every component equally important in achieving the positive benefits of economic freedom.

Each freedom is weighted equally in determining country scores.

Countries considering economic reforms may find significant opportunities for improving economic performance in those factors in which they score the lowest.

These factors may indicate significant binding constraints on economic growth and prosperity.

Importing the necessary libraries.

The dataset has 33 columns and 186 rows of data representing 186 countries.

The EFIndexData_Orig dataframe contains every country's economic freedom index and its ranking in 2019. It also displays the related economic indices like Property Rights, Judicial Effectiveness, GDP, etc. The column "2019 Score" is the economic freedom index, which measures the degree of economic freedom in the 186 nations.

All 32 of the column names are now nicely formatted with appropriate data types too.

Checking for missing values in the data set.

The 13 countries with missing data are either small nations with low populations (Liechtenstein, Kiribati et al.), isolated (North Korea), or those experiencing long term conflict (Yemen, Syria, Somalia et al.); none of these charecteristics contribute substantially to a model being designed around global level economic freedom index analysis. The data collected from the isolated and unstable countries are often estimates that may not be relied upon. They are outliers. Missing data for these territories are attributes such as "2019 Score", "World Rank", "Region Rank", "FDI_Inflow_in_Millions", and "Unemployment Percent". These are not values that can be credibly imputed.


Population_in_Millions, GDP_in_Billions_by_PPP_in_USA_Dollars, GDP_per_Capita_by_PPP_in_USA_Dollars, Unemployment_Percent, and FDI_Inflow_in_Millions are of object type in the data but are actually numeric in nature.

There are no duplicate values in this data set.

No missing values remaining in the data set.

Data Preprocessing.

Processing columns.

1. Population_in_Millions

2. GDP_in_Billions_by_PPP_in_USA_Dollars and GDP_per_Capita_by_PPP_in_USA_Dollars

3. FDI_Inflow_in_Millions

For this analysis, I will be keeping the remaining columns and attributes in the data set at this point even though there will most likely be high multi-collinearity due to the nature of how the "2019_Score" Economic Freedom Index is derived. As this is not a regression analysis excercise, the collinearity between some of the independent variables is not a major concern. Additionally, some of the variables are close in nature to one another such as the GDP, Taxation, and Governance but all are individually important nuggets of information that the Heritage Foundation put together.

The zeros that appear in variables such as "Income_Tax_Rate_Percent", "Corporate_Tax_Rate_Percent", "GDP_growth_Rate_Percent", "Fiscal_Health", "Investment_Fredom" etc. will remain as they are valid values for the countries in question.

Exploratory Data Analysis (EDA).

Observations

Univariate Analysis.

Observations

The rest of the fields are either normal distributions or categorical non-numeric variables.

Observations

The curves illustrate similar information to the earlier boxplots and histograms above.

Observations

Observations

According to Times World Atlas,there are:

Apart from Asia-Pacific, all the other regions include more than 90% of their countries.

Bivariate Analysis

Checking for correlations.

Observations

As alluded to earlier in this analysis there are several correlated variables in this data set which is by design and most are obvious too. Some examples of highly correlated columns are:

Observations

Observations

Which region has the largest standard deviation in the 2019 Economic Freedom Index?:

Observations

We will drop the Tariff_Rate_Percent column from the data.

Data Preprocessing

Scaling

Clustering

K-means Clustering

Observations

The appropriate value of k from the Elbow curve seems to be 3, 5, 6, or 7.

Check the silhouette scores.

Observations

From the silouette scores, it seems that 4 is a reasonable value for k.

Silhouette Plot

We will proceed with k=4.

Check additional internal performance evaluation scores as well as external performance evaluation scores too.

Observations

Using 3 clusters the evaluation measures give us the above detailed scores.

Hierarchical Clustering

Observations

Clustering Profiling and Comparison

K-means Clustering

Observations

Insights

Hierarchical Clustering

Observations

Insights

K-Means Clustering vs Hierarchical Clustering Comparison

Observations

Observations

Compare Hierarchical Clusters vs Various Attributes in the Data Set.

Observations

Observations

The tables above break down the distributions of the countries and where they fall within their cluster along the scale of the variable in question. For example "Government_Integrity" has Cluster 1 having the highest scores, Cluster 0 with the lowest, and Cluster 2 in the middle. The unique trio in cluster 3 report "Government_Integrity" scores of 47.8 (China), 49.1 (India), amd 77.4 (U.S.A.) respectively. Similar distributions for "Property_Rights", "Judicial_Effectiveness", "Fiscal_Health", and the five "Freedom" variables.

These illustrations further highlight and add to the analyses done earlier.

Observations

Which countries have the highest "2019_Score" within each Hierarchical cluster?

Which countries have the lowest "2019_Score" within each Hierarchical cluster?

Observations

Insights

I have lived and worked on three continents thus far in my life and have traveled to five continents. There is wide variety in economic conditions and standards of living across the world and I have often wondered why this is the case? A country having strong institutions and robust policies are frequently mentioned as key to enabling sustained economic growth as well as rapid development.

Some countries are very organized and have well thought out laws that are enforceable by capable and independent legislative and judicial systems. Other nations are endemic with corruption which can discourage investors both from within and from without. Additionally, some governmental and political environments make it very challenging for private enterprise to succeed and thrive.

Several organizations compile economic data to try to quantitatively measure or score the quality of governance and economic climate among the many nations and territories of our world. The Heritage Foundation is a Washington D.C. based think tank publishes the Economic Freedom Index covers 12 freedoms – from property rights to financial freedom – in 184 countries. For twenty-seven years the Index has delivered thoughtful analysis in a clear, friendly, and straight-forward format. The Index of Economic Freedom is poised to help readers track over two decades of the advancement in economic freedom, prosperity, and opportunity and promote these ideas in their homes, schools, and communities.

Research and studies have demonstrated that this Index does in many ways correspond strongly to economic growth. Economic theory predicts that increased freedoms will almost certainly lead to improved prosperity.

===============================================================================================================

I have used the 2019 Economic Freedom Index data set to group the reporting nations into high level clusters based on shared characteristics. As expected, there is also diversity within the clusters too. Focusing on the 3 clusters created using Hierarchical Clustering method they are:

Developed Countries (Hierarchical Cluster 1 in alphabetical order)

'Australia', 'Austria', 'Belgium', 'Canada', 'Chile', 'Croatia', 'Cyprus', 'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hong Kong', 'Iceland', 'Ireland', 'Israel', 'Italy', 'Japan', 'Korea, South', 'Luxembourg', 'Netherlands', 'New Zealand', 'Norway', 'Portugal', 'Singapore', 'Slovenia', 'Spain', 'Sweden', 'Switzerland', 'Taiwan', and 'United Kingdom'.

'Hong Kong' has the highest '2019 Score' in this cluster and 'Greece' has the lowest score among the developed countries.

Some of the characteristics of the developed countries are high freedom scores (financial, investment, trade, monetary, and labor), fiscal health scores above 51.0, and a tax burden of mostly below 76.8 with outliers Singapore and Hong Kong in the 90s. Furthermore, high judicial effectiveness (above 46.5) and property rights (52.4 and higher) seem to work in tandem with government integrity score of 50.5 and more with Croatia being the outlier at 38.6.

Income tax rate percentages are high with most of the developed countries above 31.3; corporate tax rate percentages are in the middle tier realm of between 12.5 and 35.0. Inflation rates are low (0.2 to 2.7) with unemployment rates spread out across the spectrum.



Developing Countries - Upper Tier (Hierarchical Cluster 2 in alphabetical order)



'Albania', 'Armenia', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Barbados', 'Belarus', 'Bhutan', 'Bosnia and Herzegovina', 'Botswana', 'Brunei Darussalam', 'Bulgaria', 'Cabo Verde', 'Colombia', 'Costa Rica', 'Czech Republic', 'Eswatini', 'Fiji', 'Georgia', 'Hungary', 'Indonesia', 'Jamaica', 'Jordan', 'Kazakhstan', 'Kuwait', 'Kyrgyz Republic', 'Latvia', 'Lesotho', 'Lithuania', 'Macau', 'Macedonia', 'Malaysia', 'Maldives', 'Malta', 'Mauritius', 'Mexico', 'Mongolia', 'Montenegro', 'Morocco', 'Namibia', 'Oman', 'Panama', 'Peru', 'Poland', 'Qatar', 'Romania', 'Russia', 'Rwanda', 'Saint Lucia', 'Saint Vincent and the Grenadines', 'Samoa', 'Saudi Arabia', 'Serbia', 'Slovakia', 'South Africa', 'Thailand', 'Tonga', 'Trinidad and Tobago', 'Turkey', 'United Arab Emirates', 'Uruguay', and 'Vanuatu'.

'United Arab Emirates' has the highest '2019 Score' in this cluster and 'Lesotho' has the lowest score among the developing countries - upper tier.



Some of the characteristics of the developing countries upper tier are middle tier freedom scores (financial, investment, trade, monetary, and labor), fiscal health scores range from 3.7 in Bahrain to 100 in Macau, and a tax burden of range of 62.1 in South Africa to 99.8 in Saudi Arabia. Furthermore, judicial effectiveness goes from 23.8 to 68.2 with Rwanda and the U.A.E. being above 80 and property rights (37.6 to 84.1) seem to work in tandem with government integrity score spanning 23.4 to 78.8 in the U.A.E.

Income tax rate percentages are at 0.0 in several oil-rich Middle Eastern nations, Bahamas, and Vanuatu and top of at 38.0 in Morocco; corporate tax rate percentages are spread from 0.0 to 33 with ten nations at 25.0. Inflation rates range from -0.9 deflation in Saudi Arabia to 13.0 in Azerbaijan with unemployment rates spread out across the spectrum starting at 0.1 to 9.7 in Bahamas.



Developing Countries - Lower Tier - Emerging Economies (Hierarchical Cluster 0 in alphabetical order)



'Afghanistan', 'Algeria', 'Angola', 'Argentina', 'Bangladesh', 'Belize', 'Benin', 'Bolivia', 'Brazil', 'Burkina Faso', 'Burma', 'Burundi', 'Cambodia', 'Cameroon', 'Central African Republic', 'Chad', 'Comoros', 'Congo, Democratic Republic of the Congo', 'Congo, Republic of', "Côte d'Ivoire", 'Djibouti', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Ethiopia', 'Gabon', 'Gambia', 'Ghana', 'Guatemala', 'Guinea', 'Guinea-Bissau', 'Guyana', 'Haiti', 'Honduras', 'Iran', 'Kenya', 'Laos', 'Lebanon', 'Liberia', 'Madagascar', 'Malawi', 'Mali', 'Mauritania', 'Moldova', 'Mozambique', 'Nepal', 'Nicaragua', 'Niger', 'Nigeria', 'Pakistan', 'Papua New Guinea', 'Paraguay', 'Philippines', 'São Tomé and Príncipe', 'Senegal', 'Sierra Leone', 'Solomon Islands', 'Sri Lanka', 'Sudan', 'Suriname', 'Tajikistan', 'Tanzania', 'Timor-Leste', 'Togo', 'Tunisia', 'Turkmenistan', 'Uganda', 'Ukraine', 'Uzbekistan', 'Venezuela', 'Vietnam', 'Zambia', and 'Zimbabwe'.
The 'Philippines' has the highest '2019 Score' in this cluster and 'Venezuela' has the lowest score among the developing countries - lower tier - emerging economies.

Some of the characteristics of the developing countries lower tier emerging economies are low freedom scores (financial, investment, trade, monetary, and labor), fiscal health scores range from 0.0 (Republic of Congo, Egypt, and Gambia) to 99.3 in Afghanistan, and a tax burden of spread of 46.1 in Chad to 96.3 in Timor-Leste. Furthermore, low judicial effectiveness goes from 12.3 in Bolivia to 52.1 in Tajikistan; and property rights (7.6 in Venezuela to 59.2 in Tonga) seem to work in tandem with government integrity score spanning 7.9 (Venezuela) to 41.2 in The Gambia.

Income tax rate percentages are spread at spread from 13.0% in Belarus, Tajikistan, and Bolivia; corporate tax rate percentages range from 7.5 to 50 with sixteen nations at 25.0 and 22 nations at 30.0. Inflation rates range from -0.9 deflation in Chad to 1087.5 in Venezuela with unemployment rates spread out across the spectrum starting at 0.2 in Cambodia to 25.0 in Mozambique.



Large Countries - Lower Tier - Emerging Economies (Hierarchical Cluster 3 in alphabetical order)



China is the world's largest country populationwise at 1.4 billion followed by India with 1.3 billion and the U.S.A. at 331 million according to the uly 2021 estimates published by the U.S. Census Bureau. These three countries also have the largest (U.S.A), second largest (China), and fifth largest (India) economies in the world as measured by GDP.

They are in a category of their own as other nations with large populations do not have comparable economy sizes. Futhermore, other nations with large economies (though not comparably so) do not have large population sizes.

===============================================================================================================

There are a large number of correlated variables, such as the aggregate, '2019_Score' and the different components of that score, additionally the population size with GDP.

Assuming a meticulous method executed by the Heritage Foundation, it is observed that increases in business freedom, fiscal freedom, trade freedom, and property rights have the greatest effect on increasing the GDP per capita PPP for a given country; whereas investment freedom, monetary freedom, and government spending decrease the GDP per capita PPP (a high score in government spending corresponds to low actual spending). The government spending relationship can be explained by the definition of GDP which in of itself can be a problematic economic measure as described at this link: https://www.economist.com/briefing/2016/04/30/the-trouble-with-gdp.



Finally, there are some other shortcomings to this particular Index because it does not take into account the vast untapped valuable mineral resources of many countries in Sub-Saharan Africa, South America, and Asia. Additionally, the stable informal entrepreneural sectors of many developing nations is not taken into account, nor are the substantial raw material resources that supply lucrative industries such as gemstones (jewelry), cocoa (chocolate), coltan (cell phones and computers), uranium (nuclear industry) etc.